Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder
نویسندگان
چکیده
We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cells based on published chromatin immunoprecipitation sequencing data. Furthermore, CisFinder successfully identified alternative binding motifs of TFs (e.g. POU5F1, ESRRB, and CTCF) and motifs for known and unknown co-factors of genes associated with the pluripotent state of ES cells. CisFinder also showed robust performance in the identification of motifs that were only slightly enriched in a set of DNA sequences.
منابع مشابه
A New Exhaustive Method and Strategy for Finding Motifs in ChIP-Enriched Regions
ChIP-seq, which combines chromatin immunoprecipitation (ChIP) with next-generation parallel sequencing, allows for the genome-wide identification of protein-DNA interactions. This technology poses new challenges for the development of novel motif-finding algorithms and methods for determining exact protein-DNA binding sites from ChIP-enriched sequencing data. State-of-the-art heuristic, exhaust...
متن کاملA generic motif discovery algorithm for sequential data
MOTIVATION Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. RESULTS Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied ...
متن کاملIdentification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space
The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs,...
متن کاملDesign of a New Deterministic Algorithm for Finding Common Dna Subsequence
Computational methods have become especially important since the advent of genome projects, whose objective is to decode the entire DNA sequence. Sequence motifs are short, recurring patterns in DNA that are presumed to have a biological function. These motifs are often responsible for similarity or dissimilarity in biological features and their DNA patterns. In this paper, we start with a data...
متن کاملProtein-DNA Binding: Discovering Motifs and Distinguishing Direct From Indirect Interactions
Computer Science) Protein-DNA Binding: Discovering Motifs and Distinguishing Direct From Indirect Interactions by Raluca M. Gordân Department of Computer Science Duke University Date: Approved: Alexander J. Hartemink, Advisor Uwe Ohler Bruce R. Donald David M. MacAlpine An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 16 شماره
صفحات -
تاریخ انتشار 2009